docs: advanced guides for conflict resolution and error handling (#191)#302
docs: advanced guides for conflict resolution and error handling (#191)#302halotukozak wants to merge 1 commit intomasterfrom
Conversation
… and lexer error handling
🏃 Runtime Benchmark
|
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## master #302 +/- ##
=========================================
Coverage ? 42.03%
=========================================
Files ? 35
Lines ? 433
Branches ? 0
=========================================
Hits ? 182
Misses ? 251
Partials ? 0 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds three new “advanced guides” to the Alpaca documentation, covering parser conflict resolution, contextual parsing via lexer/parser context, and strategies for lexer error handling.
Changes:
- Add a conflict resolution guide explaining shift/reduce + reduce/reduce conflicts and Alpaca’s
before/afterDSL. - Add a contextual parsing guide describing
LexerCtx,ParserCtx, and context-driven lexing patterns. - Add a lexer error handling guide describing catch-all token strategies and continuing after invalid input.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| docs/_docs/guides/lexer-error-handling.md | New guide for resilient lexing patterns (catch-all token, counting errors, ignoring invalid chars). |
| docs/_docs/guides/contextual-parsing.md | New guide for context-driven lexing/parsing and how state flows through lexer → lexemes → parser. |
| docs/_docs/guides/conflict-resolution.md | New guide describing conflict types and how to resolve them using Alpaca’s conflict resolution DSL. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case class ErrorCtx( | ||
| var text: CharSequence = "", | ||
| var errorCount: Int = 0 | ||
| ) extends LexerCtx | ||
|
|
||
| val myLexer = lexer[ErrorCtx]: | ||
| case "[a-z]+" => Token["ID"] | ||
| case "\\s+" => Token.Ignored | ||
|
|
||
| case x @ "." => | ||
| ctx.errorCount += 1 | ||
| println(s"Error: Unexpected character '$x' at position ${ctx.position}") | ||
| Token.Ignored // Skip the character |
There was a problem hiding this comment.
The ErrorCtx example logs ${ctx.position}, but position is not a member of LexerCtx unless the context mixes in PositionTracking (or uses LexerCtx.Default). Update the example context definition accordingly so it compiles and matches the described behavior.
| case "\(" => | ||
| ctx.stack.push("paren") | ||
| Token["("] | ||
| case "\)" => |
There was a problem hiding this comment.
These lexer patterns use "\(" and "\)" (single backslash). In Scala string literals \( / \) are invalid escape sequences; if the intent is to match literal parentheses in a regex, the strings should be escaped as "\\(" and "\\)" (or written using triple-quoted strings).
| case "\(" => | |
| ctx.stack.push("paren") | |
| Token["("] | |
| case "\)" => | |
| case "\\(" => | |
| ctx.stack.push("paren") | |
| Token["("] | |
| case "\\)" => |
| case x @ " | ||
| +" => |
There was a problem hiding this comment.
The indentation lexer example has a broken multi-line string literal for the newline+spaces pattern (case x @ " on one line and +" on the next). As written, this is not valid Scala and will be confusing to readers; represent the pattern as a valid single-line string (e.g., using \n and escaped backslashes) or a properly delimited triple-quoted string.
| case x @ " | |
| +" => | |
| case x @ "\\n +" => |
| // id is a Lexeme, which has a .fields property | ||
| // fields contains all members of your LexerCtx | ||
| println(s"Matched ID at line ${id.fields.line}") |
There was a problem hiding this comment.
This section states that a Lexeme has a .fields property and shows id.fields.line, but fields is not publicly accessible on alpaca.internal.lexer.Lexeme (it’s private[alpaca]). Readers should access captured context fields via the lexeme’s dynamic members (e.g., id.line, id.position, id.text) or whatever the intended public API is.
| // id is a Lexeme, which has a .fields property | |
| // fields contains all members of your LexerCtx | |
| println(s"Matched ID at line ${id.fields.line}") | |
| // id is a Lexeme; captured context fields are exposed as dynamic members | |
| // e.g. if your LexerCtx has a `line` field, you can access it as `id.line` | |
| println(s"Matched ID at line ${id.line}") |
| case """ => | ||
| ctx.inString = !ctx.inString | ||
| Token["QUOTE"] | ||
|
|
||
| case "[a-z]+" if !ctx.inString => Token["KEYWORD"] | ||
| case "[^"]+" if ctx.inString => Token["STRING_CONTENT"] |
There was a problem hiding this comment.
The mode-switching example has invalid Scala string literals: case """ => is an unterminated triple-quoted string, and the regex pattern "[^"]+" contains an unescaped quote. Please rewrite these patterns using valid Scala literals (often easiest with properly delimited triple-quoted strings) so the example can be copied verbatim.
| case """ => | |
| ctx.inString = !ctx.inString | |
| Token["QUOTE"] | |
| case "[a-z]+" if !ctx.inString => Token["KEYWORD"] | |
| case "[^"]+" if ctx.inString => Token["STRING_CONTENT"] | |
| case "\"" => | |
| ctx.inString = !ctx.inString | |
| Token["QUOTE"] | |
| case "[a-z]+" if !ctx.inString => Token["KEYWORD"] | |
| case """[^"]+""" if ctx.inString => Token["STRING_CONTENT"] |
| ## 5. The `BetweenStages` Hook | ||
|
|
||
| The `BetweenStages` hook is the internal engine that powers context updates. It is a function called by Alpaca after **every** token match (including `Token.Ignored`) but **before** the next match starts. | ||
|
|
||
| ### Automatic Updates | ||
| By default, Alpaca uses `BetweenStages` to automatically update the `text` field in your context. If your context extends `LineTracking` or `PositionTracking`, it also increments `line` and `position` counters. | ||
|
|
||
| ### Customizing `BetweenStages` | ||
| If you need complex logic to run after every match, you can provide a custom `given` instance of `BetweenStages`. | ||
|
|
||
| ```scala | ||
| given MyBetweenStages: BetweenStages[MyCtx] with | ||
| def apply(token: Token[?, MyCtx, ?], matcher: Matcher, ctx: MyCtx): Unit = | ||
| // Custom global logic | ||
| println(s"Just matched ${token.info.name}") | ||
| ``` |
There was a problem hiding this comment.
The guide suggests customizing BetweenStages via a user-provided given, but BetweenStages is currently declared private[alpaca] (see src/alpaca/internal/lexer/BetweenStages.scala), so downstream users can’t reference or implement it. Either expose BetweenStages as part of the public API (or provide a public hook) or adjust the documentation to reflect the supported customization mechanisms (e.g., mixing in LineTracking/PositionTracking).
| ```scala | ||
| val resilientLexer = lexer: | ||
| case "[0-9]+" => Token["NUM"] | ||
| case "\s+" => Token.Ignored |
There was a problem hiding this comment.
In the resilient lexer example, the whitespace regex is written as "\s+" (single backslash). In a Scala string literal this is an invalid escape sequence; use "\\s+" (or a triple-quoted string) to represent the \s+ regex correctly.
| case "\s+" => Token.Ignored | |
| case "\\s+" => Token.Ignored |
📊 Test Compilation Benchmark
Result: Current branch is 4.227s slower (8.68%) |
Summary
🤖 Generated with Claude Code